Solvation vs. freezing in a heteropolymer globule 
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Abstract 

We address the response of a random heteropolymer to preferential solvation of certain monomer 
types at the globule-solvent interface. For each set of monomers that can comprise the molecule's 
surface, we represent the ensemble of allowed configurations by a Gaussian distribution of energy 
levels, whose mean and variance depend on the set's composition. Within such a random energy 
model, mean surface composition is proportional to solvation strength under most conditions. The 
breadth of this linear response regime arises from approximate statistical independence of surface 
and volume energies. For a diverse set of monomer types, the excess of solvophilic monomers at 
the surface is large only for very strong solvent preference, even in the ground state. 



A polymer chain collapses into a compact globular state in poor solvent. A chain with 
quenched sequence of chemically different units can further undergo a freezing transition, in 
which the freedom of chain shape fluctuations is sacrificed for the choice of optimal spatial 
contacts between monomers. This freezing, or folding, is subject to constraints imposed by 
chain connectivity, quenched sequence, and excluded volume. The effects of frustration due 
to these constraints are well understood 1 . However well developed, current theories 
of heteropolymer freezing ignore one obvious fact, namely that some chain segments are 
more favorably solvated than others. By contrast, much of the protein literature presumes 
preferential solvation to be a leading determinant of tertiary structure. It is commonly 
held that a protein's surface is composed of hydrophilic units, while hydrophobic units are 
invariably buried in the core. 

For heteropolymers in general, it is clear that energy gained through preferential exposure 
of solvophilic units comes at a cost. Constraining particular units to the globule surface re- 
stricts the selection of contacting monomer pairs inside the globule, exacerbating frustration. 
In other words, when the sequence of units has not been designed in an intelligent way, as 
is the case for the random sequence heteropolymer, preferential exposure may significantly 
reduce the availability of low energy conformations. The question thus arises: what is the 
effect of solvation on heteropolymer freezing, or more specifically, how large an excess of 
solvophilic units at the globule surface is consistent with freezing? 

This question was first discussed by two of us 4^ in the context of studies of mechanical 
stretching of heteroplymers. Using a replica approach, we found that solvent preference 
of strength T for particular monomers at the surface lowers ground state energy E g by an 
amount ~ KT 2 /Tf r . Here, K ~ TV 2 / 3 is the number of monomers exposed to the solvent, 
iV is the number of monomers comprising the molecule, and Tf r is the freezing temperature 
below which the ground state dominates. For strong solvation this approach apparently 
fails. In particular, KT 2 /Tf r can exceed the maximum possible solvation energy (without 
distortion of globule shape), KT, corresponding to a completely solvophilic surface. 

This Letter describes a more comprehensive treatment of solvation based on the Random 
Energy Model (REM) of Derrida p|. It is well known that this simplest model 

of freezing in spin glasses captures remarkably well the essential features of heteropoly- 
mer freezing in three dimensions. The mapping of heteropolymer problem on the REM is 
achieved by approximating energies of all M = e sN different conformations of a random 



sequence heteropolymer as M independent random variables drawn from the Gaussian dis- 



tribution w(E) oc exp — (E — Ej /2NA 2 . In the volume approximation, when energy of 
every conformation is solely due to ~ N monomer-monomer contacts, this distribution is 
fully determined by the mean B and variance 5B 2 of contact energies, so that E = NB and 
A 2 = 5B 2 . 

The simplest way to incorporate surface into this picture is to imagine that contacts 
between surface monomers and solvent are also, in effect, statistically independent random 
variables. The variance of surface energy, KT 2 , then adds to that of volume energy. We 
use the saddle point of the partition function, Z = e sN JdE w(E)e~ E ^ T ', to estimate the 
energy of representative conformations, E = E — N5B 2 /T — KT 2 /T. The lower bound of 
the spectrum is reached when e sN w(E) ~ 1, yielding a typical ground state energy: 

rsT 2 

E (typ) ~E- V^NA ~E- V^NSB - (1) 



Correspondingly, Tf r ~ 5B/y2s. The final term in Eq. i.e., the change in ground state 
energy due to solvation, is (within a factor of order unity) — KT 2 /Tf T , just as found in the 
replica approach of Ref . ^| . 

There are several reasons to be skeptical of the suggested independence of surface and 
volume energies. First, removing a specific set of monomers from the globule interior to the 
surface modifies the distribution of contacting monomers. Secondly, there are a finite number 
of solvophilic monomers in a given molecule (possibly fewer than K). When a large fraction is 
placed on the surface, the supply of solvophilic monomers is strongly depleted, and solvation 
energy saturates. Finally, certain choices of surface monomers constrain configuration space 
more strongly than others. 

We examine these effects using a model in which each monomer is labeled by a quenched 
variable a. When a monomer with label a resides on the surface, it is assigned a solvation 
energy —Yo. Solvophilic species are thus characterized by o > 0, while o < for solvopho- 
bic species. In its total effect the solvent preference T can be viewed as an external field 
that couples linearly to net surface composition, C sur f = Siesurf "*- Within the globule, a 
contacting pair of monomers, of type a and a', is ascribed energy B aa > = B + 6B oo' . For 
simplicity we restrict attention to distributions of monomer types, p(cr), with zero mean and 
unit variance: 

J dap(a)a = 0, J dap{a)a 2 = 1. (2) 



Imagine that a certain set of monomers is constrained to sit on the surface. We denote 
this set as G. In our model energetic consequences of such a constraint depend only on 
the distribution, /(cr), of monomer types in G. For example, the effective distribution of 
contacting monomers (i.e., those remaining inside the globule when G is removed), p c s{c), 
may be written as Peg(cr) = p(a) + %\p{<j) — f (cr)]. The effective mean and variance of 
contact energies are then B c s = B + (K/N)ao and 5B c q = 5B + (K/N)(3a, respectively. 
For distributions satisfying Eq. «q = and /3q = 25B 2 [1 — J daa 2 f(a)]. Similarly, the 
solvation energy per surface monomer is 7g = — r / daaf(a). 

We express the number of accessible conformations when all monomers in G are confined 
to the surface as Mq = e sN ~ Ku3 a Here, uoq is the entropy loss per surface monomer for 
particular of G. Though smaller than M, Mq is still exponentially large in N. In general ujq 
is not simply a functional of f(cr), but is instead a complicated function of G. We will assume 
that for any specific /(cr), the average of uoq over all consistent realizations of G is a constant 
independent of /(cr). In order to recover the appropriate total number of conformations after 
summing over G, we choose this constant to be UJ = K" 1 In f^j ~ In (Ne/K). 




We consider a separate REM for each possible choice of surface G. In doing so, we assume 
that allowed conformations in the corresponding subensembles are sufficiently diverse that 
their energies are Gaussian distributed, with 



Ultimately, we must reconstruct the full ensemble of compact chain fluctuations by super- 
posing all possible subensembles, i.e., by summing over G. This convolution of REMs, each 
representing a distinct choice of G, constitutes our caricature of a random heteropolymer 
with solvated surface. 

Consider the ground state of the full ensemble, i.e., the lowest of subensemble ground 
state energies, E g = mine E g (G). Interfacial energy clearly favors a solvophilic surface, but 
does it yield the lowest ground state? Let us first examine a typical value of E g (G) for 
specific G. The condition M G w G [E^(G)} ~ 1 yields: 






E£ yp) (G) ~ NB - NV¥s5B + Ke snrf {G), 



(4) 
(5) 




This most probable ground state energy is indeed minimized by an exclusively solvophilic 
choice of G. There are many distinct choices of G, however, leading to the same value of 
E^ yp \G). It is therefore crucial to account for variations in E g {G) among similar subensem- 
bles. According to the statistics of extreme values the probability that the lowest energy in 
a particular subensemble deviates from E g typ \G) by an amount 5E g (G) = E g (G) — E^ typ \G) 
is 

Compared to the Gaussian distribution of energies within a subensemble, W[5E g (G')] decays 
very slowly for 5E g (G) < 0. When many subensembles share a common value of E^ yp \G), 
their range of E g (G) will be broad. 

The vast majority of subensembles have unremarkable surface energy. The number with 
|e S urf(G)| < K and SE S (G) = S is thus roughly e" K W(£). Since the tail of W[SE g (G)} is 
exponential, we expect 0(1) of the subensembles with insignificant surface energy to have 
\5E g (G)\ = 0(K). In other words, the variations in volume energy among these subensem- 
bles are comparable in magnitude to the largest possible surface energy. This result may be 
viewed as the consequence of an effective entropy that remains important even at low tem- 
perature. The collection of subensembles with appreciable surface energy is much smaller 
than e uK , and its entropy is correspondingly low. The ground state surface is uniformly 
solvophilic only when solvent preference is strong enough to offset this entropic cost. 

Because wg{E) depends only on f(cr), it is natural to group all subensembles with the 
same number density of monomer types. We have shown that accounting for the disparity 
in sizes of these groups is essential. The number of ways to choose K monomers with 
distribution f(a) from a pool of N monomers with distribution p(cr) is e Ns ^\ where 

Hf} = ~ J dap(a)[<j>]n<j>+ (1 - 0) In (1 - 0)]. (7) 

The density <fr(cr) = Kf(a)/Np(a) and its corresponding entropy, s{f}, are precisely those 
relevant for Langmuir adsorption of an ideal gas mixture onto K distinguishable sites. 

At and above the freezing temperature, equilibrium of a subensemble group is dominated 
by the saddle point of the partition function 

Z {f} = e Ns-KZ3 + Ns{f} j dE WG ( E)e -E/T_ (g) 



W[5E g (G)} = exp 



exp 



The group free energy, F{f} = — Tin Z{f}, is then 

5B 2 



F{f} -F + KT 



J^2 



rj(a) — In 



+ NT / dap(a) 



+ ln(l 



where 



77O) 



5B 2 2 r 



(9) 



(10) 



Volume terms independent of f(a) have been collected as F/N = B — Ts — 5B 2 /2T. Ac- 
cording to Eqs. El and the binding energy in our analogy to Langmuir adsorption varies 
with particle type a as (5B 2 /T)a 2 — Fa. 

The full partition function of the polymer, a sum over all Z{f}, is dominated by the 
subensemble group with lowest free energy: 



Z = £ Z{f} ~ Z{f*}. 

/to 



(11) 



We calculate the optimal surface distribution, /*(cr), variationally, using a Lagrange multi- 
plier to enforce proper normalization of 4>(cr). We thereby obtain 



1 + Ae'Jto 

where the constant A is determined by normalization 



(12) 



6a- ^ 



1. 



1 + Ae'Jto 

Finally, evaluating F{f} at f*(cr) yields our approximation for the total free energy F 
F + F surf , with 



(13) 



surf 



T 



J^2 



1 + In A Id 



la 



p{a) 



J dcrp(a) In ( 



1 + Ae^to 



gto 



(14) 



f do- 1+AeV{<7) 

Eqs. ir^HT4l appropriate for T > Tf r , are our principal results. They express the equilibrium 
distribution of monomer types on the polymer surface and the corresponding interfacial free 
energy density in terms of model parameters and an effective fugacity for surface monomers, 
A. 



In order to make these results concrete we consider some limiting cases and specific forms 
of p{cr). First, let us assume that preferential solvation does not lead to a significant depletion 
of any monomer type inside the globule, so that Kf(a) <C Np(a) for every a. Then, Eq. E] 
requires that Ae 11 ^ ^> 1, simplifying the above expressions to yield f\a) oc p(a) exp [— 77(a)] 
and 



To simplify this result even further, let us consider a binary distribution, p(a) = 
(l/2)[5(a + 1) + S(a — 1)], which corresponds to the minimum of chemical diversity. In 
this case depletion is invariably weak, since taking K monomers away to the surface cannot 
exhaust the total stock, N/2, of either monomer type. Eq. ITBIthen trivially yields 



For T/T <C 1, F surf ~ —KT 2 /2T, precisely as obtained by assuming statistical independence 
of surface and volume. Since net surface composition is conjugate to solvation strength, its 
equilibrium value may be computed by differentiating Eq. EH with respect to T, yielding: 



In this limit net surface composition is proportional to the "field" T. The above results 
may therefore be understood in simple terms as a manifestation of linear response. From 
Eq. [T7| we identify a susceptibility x — K/T, corresponding to surface fluctuations of 
size (C s 2 ur f)r=o — K in the absence of solvation. In other words, the excess of solvophilic 
monomers at the surface is governed by K effectively independent random variables. This 
simple behavior results directly from the prevalence of variations in volume energy over sur- 
face interactions. But when r > T, solvation wins out. Linear response then breaks down 
due to saturation, as F sn ^/K and C sur f approach their limiting values of — V and K. 

Properties of the ground state are obtained by evaluating Eq. El and EU at T = Tf r . 
Dependence on the interaction parameter SB is implicit (through Tf r ) below the freezing 
transition. For T > Tf r , however, surface response is insensitive to SB. In particular, fie 
vanishes, since no binary choice of f(a) can change the second moment of the contact energy 
distribution. As a consequence, surface and volume behave independently for arbitrary T. 

The opposite extreme of monomer diversity is described by a smooth form of p(cr), 
describing a continuous variety of chemical identities. We take a Gaussian distribution, 




(15) 




(16) 




(17) 



p(a) oc exp (— ct 2 /2) as a simple example. For weak solvation, r/T < 1, p(a) is nowhere sig- 
nificantly depleted, and Eq. ^Jremains an appropriate approximation. Gaussian integration 
yields 

F sxu { T 2 5B 4 

to leading order in 5B/T. (The basic assumption that monomer contacts are statistically 
independent is plausible only for 5B/Tf T = \^2s <C 1 |8 ; J . ) The first term in Eq. fTSl again 
reflects linear response. The second term describes the benefit in monomer contact energy 
due to partial removal of some monomer types from the globule interior. This effect is 
independent of solvation strength to leading order and dominates interfacial free energy for 
very small T. 

For such a diverse set of monomer types, surface response saturates only when a molecule's 
supply of the most solvophilic type is exhausted. Assuming weak depletion is clearly inappro- 
priate here. A maximally solvophilic surface is obtained when Kf(a) = Np(a) for a > o" max , 
and f(a) = for a < <T max . The cutoff point cr max is determined by normalization: 

r°° K 

/ dap(a) = -. (19) 



For Gaussian p(a), Eq. [T9l gives <r max = y21n(iV/ K) ~ y (2/3) In A?". Because this choice 
of surface composition uniquely specifies a monomer set G, the associated entropy s{f} 
vanishes. Free energy is then easily estimated from Eq. El giving F snT f/K ~ —Tcr max . Com- 
paring this result with the free energy of linear response, we estimate that saturation occurs 



around T ~ 2Ta max ~ TyhiA. Reaching this crossover may thus require much stronger 
solvation, and result in more favorable surface energy, than in the binary case. The relevant 
distinction between these distributions is the existence of extremely solvophilic monomers, 
whose small numbers entail considerable entropic cost in constraining them to the surface. 

FigC] summarizes the mechanisms of surface response we have identified. These results 
support a view of surface solvation energy and volume energy as statistically independent 
random variables. In particular, the linear response corresponding to this notion is valid 
over a wide range of temperature and solvation strength. Saturation at large T, though a 
nonlinear effect, does not truly arise from correlation of surface and volume. It is instead a 
consequence of the finitude of surface area or of the number of solvophilic monomers. The 
regime of weak response, in which F SUT {/K ~ 5B 4 /T 3 , does reflect coupling of surface and 
volume. But it involves monomer contact energies alone, as indicated by insensitivity to 



T. Within our model, contributions from more intimate connections between surface and 
volume are small compared to unity when K <^ N. 

The diversity of amino acid monomers comprising proteins lies somewhere between those 
of binary and Gaussian distributions. The surface behavior we have described should thus 
be relevant for chains of these units arranged in random sequence. Specifically, we predict 
that preferential solvation must be much larger than typical thermal fluctuations in order to 
stabilize a strictly solvophilic surface. Sequences found in nature, however, are not random 
in at least one respect important to freezing. Their ground states lie well below the effective 
continuum of non-native energies. The influence of this energy gap on surface solvation 
requires a consideration of sequence design that is beyond this discussion. 

P.L.G. is an M.I.T. Science Fellow. 
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FIG. 1: Response of a random heteropolymer to surface solvation, shown in the plane of tempera- 
ture T and solvation strength T. Crossover lines are the result of equating free energies, or ground 
state energies for T < Tf r . For a binary distribution of monomer types, the weak response regime 
is absent, and <r max = 1. 



